Ecology In Your Life - BIOL 230 W2021¶
Written by: Kaede Ito
Student Number: 58847666
Ecology Question¶
Given ocean temperatures off the coast of British Columbia, what could be the effect of the 2021 summer Western North America heat wave and how could it have changed the community of algae species (diversity, abundance, new species establishment)?
Data Sources¶
For this project, I need data of the ocean surface / close-to-surface temperatures during the summer heatwave period (mid June to late July) (source_link) and perferably also it’s coordinates. This would mean I’d need:
ocean temperatures (close to the surface)
time and date in the right range
coordinates
I’d also need some sort of anecdotal / observational / measured information of the algae during that time, or based on the known list of algae that resides off the coast of British Columbia, their tolerable temperature ranges.
Ocean Temperatures¶
Underway meteorological, navigational, optical, physical, and time series data collected aboard NOAA Ship Ronald H. Brown in the Coastal Waters of Southeast Alaska and British Columbia, Columbia River estuary - Washington/Oregon and others from 2021-06-13 to 2021-07-26 (NCEI Accession 0240415)
Analysis¶
Analysis and visualization was done using R and various packages. The following is the script used to generate 2 scatterplot graphs.
Setup¶
library(tidyverse)
library(lubridate)
library(ggplot2)
library(plotly)
options(repr.plot.width=10, repr.plot.height=6)
Reading and Wrangling Data¶
Temperature data is found in the "data" folder, while coordinates (and the time recorded) are in the "nav" folder inside the "data" folder.
Based on the metadata that was provided alongside the raw data (as broken down above), we need the external temperatures recorded in the SBE45-TSG-MSG_20210XXX-XXXXXX.Raw files and SST-TSG-Temp-Diff-MSG_20210XXX-XXXXXX.Raw files.
We also need the GPS data recorded in the Primary-GPS-GGA_XXX-XXXXXX.Raw files.
However, all of the data needs some preprocessing/cleaning up before it can be used in making the graphs.
Some major data cleaning has been done using a Python script, located in the data_cleaning folder. No packages were used, and can be used as long as a v3.9 Python is installed (anything above or below is untested) and the scripts are pointed to the right data sources.
clean_SBE45_data <- function(x) {
read <- read_delim(x, delim = ",",
col_names = c("date",
"time",
"int_temp",
"conductivity",
"salinity",
"sound_vel",
"ext_temp")) %>%
select(date, time, ext_temp)
return(read)
}
clean_STT_TSG_data <- function(x) {
read <- read_delim(x, delim = ",",
col_names = c("date",
"time",
"type",
"diff",
"ext_temp",
"int_temp")) %>%
select(date, time, ext_temp)
return(read)
}
clean_temp_data <- function(x) {
# https://stackoverflow.com/questions/10128617/test-if-characters-are-in-a-string
if(grepl("SBE45-TSG-MSG", x, fixed = TRUE)) {
return(clean_SBE45_data(x))
} else {
return(clean_STT_TSG_data(x))
}
}
clean_nav_data <- function(x) {
read <- read_csv(x,
col_names = c(
"date",
"time",
"type",
"time_num",
"lat",
"lat_NS",
"long",
"long_WE",
"gps_quality",
"num_sat_view",
"hort_dil",
"ant_alt",
"ant_alt_unit",
"geoidal",
"geoidal_unit",
"age_diff",
"diff_station",
"checksum"
)) %>%
select(date, time, lat, lat_NS, long, long_WE) %>%
mutate(long_WE = ifelse((long_WE == "" | long_WE == NA), "W", long_WE)) %>%
mutate(lat_NS = ifelse((lat_NS == "" | lat_NS == NA), "N", lat_NS)) %>%
mutate(lat_NS = as.factor(lat_NS), long_WE = as.factor(long_WE))
return(read)
}
The dates are all of type character, meaning extracting any use without it being a proper date type is hard. Therefore, time and date must be formatted.
format_datetime <- function(df) {
df_new <- df %>%
# https://www.neonscience.org/resources/learning-hub/tutorials/dc-time-series-subset-dplyr-r
mutate(date = as.Date(date, format = '%m/%d/%Y')) %>%
# https://www.tidyverse.org/blog/2021/03/clock-0-1-0/
mutate(datetime = as.POSIXct(date, "America/Vancouver")) %>%
mutate(datetime = datetime +hour(time)+ minute(time))
return(df_new)
}
We have all of the functions needed to clean up 1 file. However, we have quite a few files, and trying to clean and instantiate each by hand is cumbersome. Therefore, we will iterate through all of the files and summarize. The data is summarized as follows:
mean temperature/min of that day
mean latitude/min of that day
mean longitude/min of that day
Attention
Please keep in mind that the following code blocks will take pretty long to run.
# https://stackoverflow.com/questions/11433432/how-to-import-multiple-csv-files-at-once
all_temperature_loaded <- list.files(path = "data/",
pattern = "*.Raw",
full.names = T) %>%
map_df(~clean_temp_data(.))
head(all_temperature_loaded)
summary(all_temperature_loaded)
| date | time | ext_temp |
|---|---|---|
| 06/13/2021 | 19:42:03 | 19.5731 |
| 06/13/2021 | 19:42:04 | 19.5240 |
| 06/13/2021 | 19:42:05 | 19.4795 |
| 06/13/2021 | 19:42:06 | 19.4724 |
| 06/13/2021 | 19:42:07 | 19.4734 |
| 06/13/2021 | 19:42:08 | 19.4718 |
date time ext_temp
Length:7268078 Length:7268078 Min. : 1.00
Class :character Class1:hms 1st Qu.:13.37
Mode :character Class2:difftime Median :15.04
Mode :numeric Mean :15.17
3rd Qu.:16.68
Max. :24.22
NA's :2317
all_temperature <- all_temperature_loaded %>%
filter(!is.na(ext_temp)) %>%
filter(ext_temp > 2) %>%
format_datetime() %>%
group_by(datetime) %>%
summarize(mean_ext = mean(ext_temp, na.rm = TRUE))
head(all_temperature)
summary(all_temperature)
| datetime | mean_ext |
|---|---|
| 2021-06-12 17:00:20 | 18.94975 |
| 2021-06-12 17:00:21 | 18.81057 |
| 2021-06-12 17:00:22 | 19.17494 |
| 2021-06-12 17:00:23 | 19.26122 |
| 2021-06-12 17:00:24 | 19.28124 |
| 2021-06-12 17:00:25 | 19.20406 |
datetime mean_ext
Min. :2021-06-12 17:00:20 Min. :10.17
1st Qu.:2021-06-23 17:00:13 1st Qu.:13.35
Median :2021-07-04 17:00:06 Median :14.92
Mean :2021-07-04 07:00:13 Mean :15.28
3rd Qu.:2021-07-15 11:00:20 3rd Qu.:16.71
Max. :2021-07-25 17:01:16 Max. :21.71
all_nav_loaded <- list.files(path = "data/nav/",
pattern = "*.Raw",
full.names = T) %>%
map_df(~clean_nav_data(.))
all_nav <- all_nav_loaded %>%
format_datetime() %>%
group_by(datetime, long_WE, lat_NS) %>%
summarize(mean_lat = mean(lat), mean_long = mean(long)) %>%
mutate(mean_lat = mean_lat/100, mean_long= mean_long/100)
head(all_nav)
summary(all_nav)
| datetime | long_WE | lat_NS | mean_lat | mean_long |
|---|---|---|---|---|
| 2021-06-12 17:00:16 | NA | NA | 32.41784 | 117.0942 |
| 2021-06-12 17:00:17 | NA | NA | 32.41788 | 117.0941 |
| 2021-06-12 17:00:18 | NA | NA | 32.41809 | 117.0938 |
| 2021-06-12 17:00:19 | NA | NA | 32.41575 | 117.1053 |
| 2021-06-12 17:00:20 | NA | NA | 32.40404 | 117.1328 |
| 2021-06-12 17:00:21 | NA | NA | 32.39704 | 117.1738 |
datetime long_WE lat_NS
Min. :2021-06-12 17:00:16 Length:3749 Length:3749
1st Qu.:2021-06-23 17:00:14 Class :character Class :character
Median :2021-07-04 17:00:26 Mode :character Mode :character
Mean :2021-07-04 11:44:29
3rd Qu.:2021-07-15 17:00:34
Max. :2021-07-25 17:01:16
NA's :1
mean_lat mean_long
Min. :31.32 Min. :117.1
1st Qu.:33.60 1st Qu.:120.5
Median :37.50 Median :123.2
Mean :39.37 Mean :122.9
3rd Qu.:45.08 3rd Qu.:124.8
Max. :52.21 Max. :130.5
NA's :457 NA's :527
Since we have the date and time (by the minute) of both the temperature and it’s coordinates, we can match the two variables together.
joined_temp_nav <- inner_join(all_temperature,
all_nav,
by = c("datetime" = "datetime"))
head(joined_temp_nav)
summary(joined_temp_nav)
| datetime | mean_ext | long_WE | lat_NS | mean_lat | mean_long |
|---|---|---|---|---|---|
| 2021-06-12 17:00:20 | 18.94975 | NA | NA | 32.40404 | 117.1328 |
| 2021-06-12 17:00:21 | 18.81057 | NA | NA | 32.39704 | 117.1738 |
| 2021-06-12 17:00:22 | 19.17494 | NA | NA | 32.39495 | 117.1988 |
| 2021-06-12 17:00:23 | 19.26122 | NA | NA | 32.39151 | 117.2244 |
| 2021-06-12 17:00:24 | 19.28124 | NA | NA | 32.39131 | 117.2250 |
| 2021-06-12 17:00:25 | 19.20406 | NA | NA | 32.39111 | 117.2255 |
datetime mean_ext long_WE
Min. :2021-06-12 17:00:20 Min. :10.17 Length:3744
1st Qu.:2021-06-23 17:00:17 1st Qu.:13.37 Class :character
Median :2021-07-04 17:00:28 Median :14.98 Mode :character
Mean :2021-07-04 12:17:59 Mean :15.31
3rd Qu.:2021-07-15 17:00:35 3rd Qu.:16.77
Max. :2021-07-25 17:01:16 Max. :21.71
lat_NS mean_lat mean_long
Length:3744 Min. :31.32 Min. :117.1
Class :character 1st Qu.:33.63 1st Qu.:120.5
Mode :character Median :37.50 Median :123.2
Mean :39.38 Mean :122.9
3rd Qu.:45.08 3rd Qu.:124.8
Max. :52.21 Max. :130.5
NA's :456 NA's :526
Visualize the Data¶
time_plot <- ggplot(all_temperature, aes(x = datetime,
y = mean_ext,
colour = mean_ext)) +
geom_point() +
scale_colour_gradient(low = "blue", high = "red") +
labs(x = "Date and Time PST",
y = "Mean (by the min) Ocean Temperature (celcius)",
colour = "Mean External Temperature")
time_plot
p<- plot_ly(joined_temp_nav,
x = ~mean_lat,
y = ~mean_long,
z = ~mean_ext,
color = ~mean_ext) %>%
add_markers(size = 0.7)
embed_notebook(p)